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ABSTRACT 

We present a new technique called quantile analysis to classify spectral properties of X-ray sources 
with limited statistics. The quantile analysis is superior to the conventional approaches such as X- 
ray hardness ratio or X-ray color analysis to study relatively faint sources or to investigate a certain 
phase or state of a source in detail, where poor statistics does not allow spectral fitting using a model. 
Instead of working with predetermined energy bands, we determine the energy values that divide the 
detected photons into predetermined fractions of the total counts such as median (50%), tercile (33% 
& 67%), and quartile (25% & 75%). We use these quantiles as an indicator of the X-ray hardness or 
color of the source. We show that the median is an improved substitute for the conventional X-ray 
hardness ratio. The median and other quantiles form a phase space, similar to the conventional X- 
ray color-color diagrams. The quantile-based phase space is more evenly sensitive over various spectral 
shapes than the conventional color-color diagrams, and it is naturally arranged to properly represent 
the statistical similarity of various spectral shapes. We demonstrate the new technique in the 0.3-8 
keV energy range using Chandra ACIS-S detector response function and a typical aperture photometry 
involving background subtraction. The technique can be applied in any energy band, provided the energy 
distribution of photons can be obtained. 

Subject headings: 



1. INTRODUCTION 

A major obstacle for understanding the nature of the 
faintest X-ray sources is poor statistics, which prevents 
the usual practice of spectral analysis such as fitting with 
known models. Even for apparently bright sources, as we 
investigate the sources in detail, we frequently need to 
divide source photons in various phases or states of the 
sources and the success of such an analysis is often limited 
by statistics. The common practice to extract spectral 
properties of X-ray sources with poor statistics is to cal- 
culate X-ray hardness or color of the sources (e.g. Schulz 
et al. 1989; Kim et al. 1992; Netzer et al. 1994; Prestwich 
et al. 2003). In this conventional method, the full-energy 
range is divided into two or three sub-bands and the de- 
tected source photons are counted separately in each band. 
The ratio of these counts is defined as X-ray hardness or 
color, to serve as an indicator of the spectral properties of 
the source. 

In principle, one can constrain a meaningful X-ray hard- 
ness or color to a source if at least one source photon is 
detected in each of at least two bands. However, the equiv- 
alent requirement for total counts in the full-energy band 
can be rather demanding and it is strongly spectral de- 
pendent. Fig. 1 shows an example of a X-ray color-color 
diagram (Kim et al. 2003). In the figure, we divide the full- 
energy range (0.3-8.0 keV) into three sub-energy bands; 
0.3-0.9 (S), 0.9-2.5 (M), and 2.5-8.0 keV (H). The ratio of 
the net source counts in S to M band is defined as the soft 
X-ray color (a;-axis in the figure), and the ratio of the net 
counts in M to H band as the hard X-ray color (j/-axis). 
The energy range in this example is the region where the 
Chandra ACIS-S detectors are sensitive, but the technique 



described in this paper is not bounded to any particular 
energy range, provided the energy distribution of photons 
can be obtained. 

The grid pattern in the figure represents the true loca- 
tion for the sources with the power-law spectra governed 
by two parameters: power-law index (F) and absorbing 
column depths (Ah) along the line of the sight. The grid 
pattern is drawn for an ideal detector response, which is 
constant over the full-energy band. The grid pattern ap- 
pears to be properly spaced to the changes of the spectral 
parameters and such an arrangement suggests that this 
color-color diagram may be an ideal way to classify sources 
with poor statistics. 

However, the appearance can be deceiving, which is 
hinted by the uneven sizes of error bars in the figure. The 
error bar near each grid node represents the central 68% 
of simulation results from the spectral shape at the grid 
node. Each simulation has 1000 source counts in the full- 
energy range with no background counts, and thus the size 
of the symbol represents the size of typical error bars for 
1000 count sources in the diagram. 

In the figure, there are two kinds of error bars for some 
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thick error bars represent the central 68% distribution of 
the simulation results that have "proper" color values (de- 
fined here as at least one photon in each of all three bands), 
and the thin error bars show the 68% distribution of all 
the simulation trials (10,000) for each spectral model, i.e. 
a distribution of the central 6827 trials. The two error 
bars should be identical if all the trials produce a proper 
soft and hard color. 

The right panel in Fig. 1 shows the minimum counts in 
the full-energy band to have at least one count in each of 
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three bands. The figure indicates that the required mini- 
mum counts are dramatically different among the models 
in the grid. For the power-law spectra with F = and 
A^H ~ 10^^ cm~^, more than 500 counts are required in 
order to have reasonable color values, while in the case 
of F = 2 and A^h ~ 10^° cm~^, a few counts are sufh- 
cient. Because of statistical fluctuations, for some models, 
even 1000 counts in the full-energy range do not guarantee 
positive net counts in all three bands, which explains why 
some trials fail to produce proper colors. 

The spectral- model dependence of error bars in this kind 
of color-color diagram is inevitable and the dependence is 
determined by the choice of the sub-energy bands for a 
full-energy range. In this example, the sub bands are cho- 
sen so that the diagram is mostly sensitive near F = 2 and 
TVh < 10^^ cm~^. In fact, it is not possible to select sub 
bands to have uniform sensitivity over different spectral 
shapes. 

Fig. 1 clearly indicates that the conventional color-color 
diagram (CCCD) is heavily biased, which is very unfortu- 
nate when trying to extract unknown spectral properties 
from the sources. The heavy requirement on the total 
counts for certain spectral shapes defeats the purpose of 
the diagram since the large total counts may allow nom- 
inal spectral analysis such as spectral fitting with known 
models. One might have to repeat the analysis with dif- 
ferent choices of sub-bands to explore all the interesting 
possibilities of spectral shapes. 

2. QUANTILES 

In order to overcome the selection effects originating 
from the predetermined sub-energy bands, we propose to 
use the energy value to divide photons into predetermined 
fractions. We choose fractions to take full advantage of the 
given statistics, such as 50% (median), 33% & 67% (ter- 
cile), and 25% & 75% (quartile), although any quantile 
may be used. We use the corresponding energy values - 
quantiles - as an indicator of the X-ray hardness or color of 
the source. In particular, we will show below that the me- 
dian, Q50, is an improved substitute for the conventional 
X-ray hardness. 

Let E^% be the energy below which the net counts is 
x% of the total counts and we define quantile Qx 
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where -Bio and Bup is the lower and upper boundary of 
the full-energy band respectively (0.3 and 8.0 keV in this 
example). The algorithm for estimation of quantiles and 
their errors relevant to X-ray astronomy is given in the ap- 
pendix^. Unlike the conventional X-ray hardness or colors, 
for calculating quantiles, there is no spectral dependence 
of required minimum counts. The required minimum only 
depends on types of quantiles: two counts for median and 
terciles, and three for quartiles. 



Fig. 2 shows an example of quantile-based color-color 
diagrams (QCCDs) using the median ■m{= Q50) for the 
X-axis and the ratio of two quartiles Q25/Q75 for the y- 
axis (see §5 for the motivation of the choice of the axes). 
The grid pattern in Fig. 2 is drawn for the same spectral 
parameters as in Fig. 1 with five additional cases for N^i 
= 5 X 10^^ cm^'^. Note that the approximate axes of F 
and iVn are rotated ~ 90deg from those in Fig. 1 . The er- 
ror bars are drawn for the central 68% of the same 10,000 
simulation results of the spectral shape at each grid node, 
and each simulation run contains 1000 source counts with 
no background counts in the full-energy band. The rela- 
tively similar size of the error bars for 1000 count sources 
in the figure indicates that there is no spectral dependent 
selection effect. Note that there is no need of distinction 
for thin and thick error bars in this figure because all the 
trials produce proper quantiles. 

The grid pattern of power-law spectra in Fig. 2 appears 
to be less intuitively arranged than the one in Fig. 1. How- 
ever, we believe that the proximity of any two spectra in 
the quantile-based diagram accurately exhibits the simi- 
larity of the two spectral shapes (as folded through the 
detector response, which is constant in this example). For 
example, in the case for F = in Fig. 2, the separation in 
phase space by various TVh values is much smaller than in 
the case for F > 1. Note that the column depth (iVn) in 
the considered range can change mainly soft X-rays (< 2 
keV) and the spectrum for F = is less dominated by soft 
X-rays than that for F > 1. In other words, the overall 
effects of TVh on the spectral shape would be much smaller 
in the case for F = than in the case for F > 1. Therefore, 
the grid spacing in Fig. 2 indeed reveals the appropriate 
statistical power needed to discern these spectral shapes, 
despite the degeneracy arising from utilizing only three 
variables (quantiles) extracted from a spectrum. 

3. COLOR-COLOR DIAGRAM EXAMPLE 

Now let us consider more realistic examples. We intro- 
duce the Chandra ACIS-S response function"^; we use the 
pre-fiight calibration data for energy resolution^ and the 
energy dependent effective area from PIMMS version 2.3^. 

Fig. 3 shows the simulation results of the spectra with 
100 and 50 source counts with no background counts. The 
error bars again indicate the interval that encloses the cen- 
tral 68% of the simulation results for each model. The re- 
sults are shown for both the conventional and new color- 
color diagrams in the figure. The grid patterns in the 
figures are for the same power-law models and they look 
different from the previous examples because of the Chan- 
dra ACIS-S response function. 

The analysis of faint sources is often limited by back- 
ground fiuctuations. In contrast to Fig. 3, we allow for 
large background counts in the source region (e.g. a point 
source in a bright diffuse emission region). The top two 
panels in Fig. 4 shows the simulation results of the spectra 
with 100 source counts and 50 background counts in the 



The quantile analysis software (IDL and perl versions) is available at http://hea-www.harvard.6du/ChaMPlane/quantile. 
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See Chandra Proposers' Observatory Guide at http://cxc.harvard.edu/. 
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~ 100 - 250 eV FWHM. See http://cxc.harvard.edu/cal/Acls/. 



Portable, Interactive Multi-Mission Simulator. See http://heasarc.gsfc.nasa.gov/docs/software/tools/piimns.htiiil. 
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Fig. 1. — Conventional color-color diagram (CCCD); two colors are defined by the net counts (cl, c2, and c3) in three energy bands (0.3-0.9, 
0.9-2.5, and 2.5-8.0 keV). The left panel shows simulation results of 20 spectral shapes for an ideal detector with a uniform response function. 
The error bar represents the central 68% of simulation results for each of 20 power-law spectra governed by two parameters — power-law index 
(r) and column depth (A'^h) along the line of the sight. The thick error bars are for the 68% of the cases with proper color values (at least one 
count in each of the three bands) and the thin error bars for considering 10,000 trials (see text). Each simulation run contains 1000 source 
counts in the full 0.3-8.0 keV energy range. In this example, the two error bars are identical except for the case of F =0 Sz Nn =10^^ cm~^ 
and F =4 &: A^jj =10^^^ cm~-^, where many simulation runs result in null count in at least one of sub-bands. The grid pattern represents the 
true location of these spectra in the diagram. The right panel shows the required minimum counts in the full-energy band in order to have 
at least one count in each of three bands. The contour lines quantize the gray (color) scale for easier interpretation. See electronic ApJ for 
color version of the figure as well as other figures. 
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Fig. 2. — Same as Fig. 1 but quantile-based color-color diagram (QCCD) based on the median Q50 and the ratio of two quartiles Q25/Q75- 
See §5 for the origin of the logarithmic form for the x-axis. The energy scale across the top shows the median energy values (i?5o%)- The 
simulation results are given from the same simulation sets in Fig. 1 (1000 source counts). Note that there are five more spectral shapes with 
A'^H = 5 X 10^^ cm~-^ than in Fig. 1. The grid pattern represents the true location of these spectra in the diagram for an ideal detector. Note 
the relative uniformity of error bars, compared to Fig. 1. 



source region, and the bottom two panel shows the case 
of 50 source counts and 25 background counts. For back- 
ground subtraction, we set the background region to be 
five times larger than the source region. The background 
region contains 250 (top panels) and 125 background (bot- 
tom panels) counts respectively. Note these are relatively 
high backgrounds for a typical Chandra source and thus 
illustrate the worst case of a point source superimposed in 
background diffuse emission. The background photons are 
sampled from a power-law spectrum with F = and iVfj — 
0, which is folded through the Chandra ACIS-S response 
function. 

In the case of the CCCD in Figs. 3 and 4, one can no- 
tice that some of the error bars lie away from their true 
location (grid node). The severe requirement for the total 
counts (> 100) for some of the spectral models results in 
only a lower or upper limit for the color in many simula- 
tion runs. Cases with proper colors for these models are 
greatly influenced by statistical fluctuations because of low 
source counts in one or two sub-bands, and thus the esti- 
mated colors fail to produce the true value. It is evident 
that this conventional diagram is more sensitive towards 
F - 2 and A^H < 10^^ cm^^. 

In the case of the new quantile-based diagram, the error 
bars stay at the correct location and the size of the error 
bars are relatively uniform across the model phase space, 
indicating that the phase space is properly arranged. The 
quantile-based diagram shows more consistent results re- 
gardless of background. 

4. HARDNESS RATIO EXAMPLE 

When we explore the spectral properties of sources with 
an even smaller number of detected photons, the only 
available tool is the use of a single hardness ratio, which 
requires only two sub-energy bands. Here we use net 
counts 5 in a 0.3-2.0 keV soft band and net counts H 
in a 2.0-8.0 keV hard band. Two popular definitions for 
hardness ratio exist in the literature: HR = HjS and 
IIR=(iJ — 5')/(i7 + 5). We compare the median with the 
conventional hardness ratio using the first definition. 

The left panel in Fig. 5 shows the performance of the 
conventional hardness ratio as a function of the total 
counts folded through the ACIS-S response. The plot con- 
tains three spectral shapes with F = 2. The shaded regions 
represent the central 68% of the simulation results for a 
given total net counts (except for the case of 7J or S' < 0). 
The simulation is done for the case without background 
(top panels) and for the case that the source region con- 
tains an equal number of source and background counts 
(bottom panels), where we perform a background subtrac- 
tion identical to the one described in the previous section. 
The background follows the same spectrum as in the pre- 
vious examples. 

Similar to the CCCD, the required minimum of the to- 
tal counts for the conventional hardness ratio depends on 
spectral shape. The right panels in Fig. 5 show the frac- 
tion of the simulations that have at least one count in both 
the soft and hard bands (otherwise HR = or oo). The 
plot indicates that one cannot assign a proper hardness 
ratio value in a substantial fraction of cases when the to- 

6 
The inverse of hyperbolic tangent; ~ log(m,) when m — > , and ~ - 



tal net counts are less than 100. In the case of A^h — 10^" 
cni^^, many simulation runs result in _ff = (too soft), 
and for A^h = 5 x 10^^ cm"^, 5* = (too hard). A set of 
predetermined bands keeps the hardness ratio meaningful 
only within a certain range of spectral shapes. In order to 
compensate for such a limitation, one needs to repeat the 
analysis with different choices of bands, which in turn will 
have a different limitation. 

Even for the cases with a proper hardness ratio value 
(the left panel in Fig. 5), the hardness ratio distribution 
of two spectral types (A^h = 10^^ and 5 x 10^^ cm~^) over- 
laps substantially when the total net counts are less than 
40 in the case of high background. Below 10 counts, it is 
difficult to relate the distributions to their true value. 

Fig. 6 shows the performance of the new hardness de- 
fined by the median. First, there is no loss of events. Sec- 
ond, three spectra are well separated down to less than 10 
counts regardless of background. Even at two or three net 
counts, each distribution is distinct and well related to its 
true value. 

5. DISCUSSION 

The newly defined hardness and color-color diagrams 
using the median and other quantiles are far superior to 
the conventional ones. One can choose specific quantiles 
and their combinations relevant to one's needs. We believe 
the new phase space by the median and quartiles is very 
useful in general, as summarized in Fig. 7. 

For a given spectrum, various quantiles are not indepen- 
dent variables, unlike the counts in different energy bands. 
However, the ratio of two quartiles is mostly independent 
from the median, which makes them good candidate pa- 
rameters for color-color diagrams. In essence, the median 
Qso shows how dense the first (or second) half of the spec- 
trum is and the quartile ratio Q25/Q75 shows a similar 
measure of the middle half of the spectrum. 

For the a;-axis in Fig. 7, one can simply use ni (= Qso) 
on a log scale to explore the wide range of the hardness. 
However, a simple log scale compresses the phase space for 
relatively hard spectra (m > 0.5; note < m < 1). Our 
expression^, log(TO/(l — ttt-)), shows both the soft and hard 
phase space equally well. 

In Fig. 7, a fiat spectrum lies at x~0 and y—1 in the 
diagram (0 < y < 3). The spectrum changes from soft to 
hard as one goes from left to right in the diagram, and it 
changes from concave-upwards to concave-downwards, as 
one goes from bottom to top. The examples in the pre- 
vious sections explore a soft part of this phase diagram, 
which is modeled by a power law. 

In the case of a narrow energy range, where an emission 
or absorption line is expected on a relatively flat contin- 
uum, one can use the QCCD to explore a spectral line 
feature even with limited statistics that normally forbids 
normal spectral fitting. In this case, the median (x-axis) 
in the QCCD can be a good measure of line shift, and 
the quartile ratio (y-axis) can be a good measure of line 
broadening. 

The right panel in Fig. 7 shows the overlay of the grid 
patterns for typical spectral models - power law, thermal 
Bremsstrahlung, and black body emission. For high count 

log(l — m) when m — > 1. 
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Fig. 3. — Realistic example of the color-color diagram with no background: 100 (top panels) and 50 (bottom panels) source counts for each 
spectral model folded through the Chandra ACIS-S response function. The left two panels show the CCCD and the right two panels show the 
QCCD. The grid pattern of both cases are different from the previous example because of the ACIS-S response function. In the case of the 
CCCD, the average of the simulation results arc often severely different from the true value (grid node) due to the selection effect inherent to 
the band choice. The selection effect also causes the large difference between thick and thin error bars, which indicates that only a fraction 
of simulations produce proper colors. The fraction can be even smaller than 68% of the total 10,000 runs, in which case the thin error bar 
represents the full range of the simulation outcomes with proper colors. The quantile-based diagram provides the correct results regardless of 
spectral shapes. 
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Fig. 4. — Realistic example of the color-color diagram with relatively high background: 100 count source with 50 background counts (top 
panels) and 50 count source with 25 background counts (bottom panels). The left two panels show the CCCD, and the right two panels 
show the QCCD. The background follows a flat spectrum, and both source and background photons are folded through the Chandra ACIS-S 
response function. The diagrams are generated after the background subtraction (see the appendix). Similar to the previous results, the new 
technique performs well throughout the phase space. In particular, note the relatively uniform error bars across the entire grid as well as the 
relative lack of points outside the grid. 
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Fig. 5. — Conventional hardness ratio calculation. The soft band (5) is 0.3-2.0 keV and hard band {H) 2.0-8.0 keV, and the Chandra 
ACIS-S response function is used. The left two panels show a conventional hardness distribution as a function of total net counts and the 
right two panels show the fraction of the simulations that have H > and S > for a given total net count. The top two panels are for 
the case of no background, and in the bottom two panels, for a given net count, the source region contains an equal number of source and 
background counts, and we perform a background subtraction identical to the one described in the previous examples. Three spectral shapes 
(r = 2) are simulated, and the solid horizontal lines in the left panels represent the true hardness ratio of each spectrum. The shaded region 
represents the central 68% of simulated results (only for H > and 5 > 0) at a given total net count. A large fraction of trials do not produce 
a proper hardness ratio when the total counts are less than 100. Below 10 counts, the hardness distribution hardly bears any relation to its 
true value. Blending of the central 68% distributions starts at ~ 40 counts for A'h = lO^'' and 5 X lO^^ cm~2 for the case with background. 
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and the grid patterns for power law (PL), thermal Bremsstrahlung (TB), black body (BE) models in a 0.3-8.0 keV range ideal detector or 
incoming flux space (right). For PL, the grid patterns are for F = 4, 3, 2, 1 & along the positive x-axis direction, for TB, kT = 0.2, 0.5, 1, 
2, 4, & 10.0 keV, and for BB, kT = 0.1, 0.2, 0.5, 1, 2, & 4 keV. For all the models, A^h = 0.1, 1, 5, & 10 xlO^i cm-2 along the positive y 
direction. 



sources, one can use such a diagram to find relevant models 
for the spectra before detailed analysis. 

Finally, we investigate how the detector energy resolu- 
tion affects the performance of the quantile-based diagram. 
The left panel in Fig. 8 shows the grid patterns of the 
power-law emission models for detectors with various en- 
ergy resolutions but with the same detection efficiency of 
the Chandra ACIS-S detector. Starting with the energy 
resolution of the Chandra ACIS-S detector in the previ- 
ous examples (Figs. 3, 4, 5 and 6; FWHM /^E - 150 
eV at 1.5 keV and ~ 200 eV at 4.5 keV), we have suc- 
cessively decreased the energy resolution by multiplying 
the energy resolution at each energy by a constant factor. 
Each pattern is labeled by the energy resolution (AE/E) 
at 1.5 keV. As the energy resolution decreases, the grid 
pattern shrinks. The pattern will shrink down to a point 
{x,y) — {0,l) in the diagram for a detector with no energy 
resolution. 

The right panel in Fig. 8 shows the 68% of the simula- 
tion results for a detector with AE/E = 100% at 1.5 keV. 
The size of the 68% distribution in this example is more 
or less similar to that in the case of the regular Chan- 
dra ACIS-S detector with AE/E = 10% (the top-right 



panel in the Fig. 3). The similarity is due to the fact 
that, unlike the grid pattern, the dispersion of the sim- 
ulation results (or the error size of a data model point) 
for each model is mainly due to statistical fluctuations for 
low count sources. In summary, as the energy resolution 
decreases, the relative distance between various models in 
the diagram decreases, but the error size of the data re- 
mains roughly the same, and thus the spectral sensitivity 
of the diagram decreases. 

Note that the overall size of the grid patterns in the left 
panel of Fig. 8 is more or less similar when AE/E < 20% 
(at 1.5 keV). We expect that for a detector with energy 
resolution AE, the overall size is dependent on a quantity 
AE/{Eup — E\o) since quantiles are determined over the 
full energy range. Therefore, the quantile-based diagram 
is quite robust against finite energy resolutions of typical 
X-ray detectors. 
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APPENDIX 
Many routines have been developed to estimate quantiles and the related statistics (Wilcox 1997). We use a simple 
interpolation technique based on order statistics to estimate quantiles. For real data, we also need to have a reliable 
estimate for the errors of quantile values, for which we employ the technique by Maritz & Jarrett (1978). For simplicity, 
we assume that we measure the energy value of each photon. 



QUANTILE ESTIMATION USING ORDER STATISTICS 

In the literature, many quantile estimation algorithms often assume that the lowest value (energy) of the given distri- 
bution is the (equivalent) lower bound of the distribution and the highest value the upper bound. In X-ray astronomy, 
the lower {E\o) and upper bound (-Eup) of the energy range is usually set by the instrument or user selection, where 
these bounds may or may not be the lowest and highest energies of the detected photons. We can explicitly impose this 
boundary condition by assigning 0% and 100% quantiles to E\o and E^p respectively. 

We sort the detected photons in ascending order of energy, and we assign Ex% to the energy Ei of the i-th photon as 

^ ^ 1 X 2i~l 
Ei = E,r% , where = , 

] X/O, -^gg 2iV ' ^ 

and A^ is the total number of net counts. Using E\o~Eq% and _Bup=£^ioo%7 one can interpolate any quantiles from the 
above relation of x and Ei. With the definitions of E\o=Eq% and £^up=£'ioo%7 the above interpolation is very robust. 
One can even calculate quantiles without any detected photons (although not meaningful), the result of which is identical 
to the case of a flat spectrum. In the case of only one detected photon at _E, the above relation reduces to E=E^Qy„. 
Therefore, the distribution of the median from one count sources with the same spectra is the source spectrum itself. 

The above interpolation for quantile E^% essentially uses only two energy values, Ei and i^i+i, where (2z — 1)/2N < 
x/lOO < (2z + l)/2N. In order to take advantage of other energy values, one can use more sophisticated techniques like 
that of Harrell & Davis (1982). In many cases, the Harrell-Davis method estimates quantiles with smaller uncertainties 
than the simple order statistics technique, but because of smoothing effects in the HarrcU-Davis method, there are cases 
that the simple order statistics performs better, such as distributions containing discontinuous breaks (Wilcox 1997). In 
real data, the finite detector resolution tends to smooth out any discontinuity in the spectra. Our simulation shows that 
about 10-15% better performance is achieved by the Harrell-Davis method compared to the simple order statistics for the 
cases of 50 source photons and 25 background photons. The results in the paper are generated by the order statistics 
technique. 
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Fig. 8. — The grid patterns for various energy resolutions with the same Chandra ACIS-S efficiency (left) and the 68% distribution of the 
simulation results for 100 count sources for a detector with AE/E = 100% {AE =FWHM) at 1.5 keV (right). Each grid pattern in the left 
panel is generated for a different energy resolution, labeled by the energy resolution [AE/E) at 1.5 keV. As the resolution decreases, the 
pattern shrinks, and if there is no energy resolution in the detector, the pattern shrinks to a point at (a;, y) = (0,1). The right figure shows the 
effects of limited photon statistics dominate those of energy resolution (compare with Fig. 3, with AE/E =10% at 1.5 keV) in determining 
the 68% error bar sizes. See the electronic ApJ for color version of the figure which distiguished the model grids more easily. 
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Fig. 9. — Accuracy of the error estimates by the Maritz-Jarrett method: the ratio of the error estimate by the Maritz-Jarrett method ((Ti,^j) 
to that from simulations (a^i^, the 68% distribution) for the cases with no background (left) and for the cases with high background (right). 
See the "background subtraction" section in the appendix for the background subtraction routine. 



11 



QUANTILE ERROR ESTIMATION USING THE MARITZ-JARRETT METHOD 



Once quantile values are acquired, one can always rely on simulations to estimate their uncertainty (or error) using a 
suspected spectral model with parameters derived from the QCCD. If no single model stands out as a primary candidate, 
one can derive a final uncertainty of the quantiles by combining the results of multiple simulations from a number of 
models. However, error estimation by simulations can be slow, cumbersome, and even redundant, considering that the 
quantile errors are more sensitive to the number of counts than the choice of spectral shapes (cf. Figs. 3 and 4). A 
quick and rough error estimate is often sufficient, and so we introduce a simple quantile error estimate technique - the 
Maritz-Jarrett method. The results (error bars and shades in Fig. 1 to 6) in the main text indicate the interval that 
encloses the central 68% of the simulation results, and were not driven by the Maritz-Jarrett method. 

The Maritz-Jarrett method uses a technique similar to the HarrcU-Davis method of quantile estimation. Both methods 
rely on L-estimators (linear sums of order statistics) using Beta functions. We sort the photons in the ascending order of 
energy from z = 1 to TV. Then, we apply the Maritz-Jarrett method with small modifications^ as follows. For Ex%, we 
set 

M^Nx + 0.5 

a = M - 1 

b = N - M. 

We then define Wi using the incomplete beta function B, 

Wi = Bia, b, j/j+i) - B{a, b, yi) 

where yi — i/N and r(n + 1) = n! (the gamma function). Using Wi and Ei, we calculate the L-estimators Cfe, 

N 



Ck ==^Wi£:f, and then 



<y{E,%) = ^Jc2-cl 



The error estimate by the Maritz-Jarrett method requires at least 3 counts for medians, 5 counts for terciles, and 6 counts 
for quartiles (a, 6 > 1) . 

Fig. 9 shows the accuracy of the error estimates by the Maritz-Jarrett method for F = 2 and A^h = 5 x 10^^ cm^-^. In the 
case of no background, the Maritz-Jarrett method is accurate when the total net counts are greater than ^ 30. It tends to 
overestimate the errors when the total net counts are below ~ 30. In the case of high background, it tends to underestimate 
the errors overall since the adopted background subtraction procedure (see below) does not inherit background statistics 
for the error estimates. We find that multiplying by an empirical factor -^/l + Nhkg/Nsrc can compensate, approximately, 
for the underestimation (A^src^ total source counts, A'^bkg^ total background counts). 

For the error of the ratio Q25/Q75, since these two quartiles are not independent variables, the simple quadratic 
combination from two quartile errors overestimates the error of the ratio (^ 20 - 30% for the examples in Figs. 3 and 4). 
One can find more sophisticated techniques such as Bayesian statistics to estimate quantile errors in the literature. For 
example, Babu (1999) and the reference therein discuss the limitation of bootstrap estimation and show other techniques 
such as the half-sample method. 

BACKGROUND SUBTRACTION 

For background subtraction, we calculate quantiles at a set of finely stepped fractions separately for photons in the 
source region and the background region. Then, by a simple linear interpolation, we establish the integrated counts 
IC{>E) (number of photons with energy greater than E) as a function of energy for both regions. Now at each energy, 
one can subtract the integrated counts of the background region from that of the source region with a proper ratio 
factor of the area of the two regions. Because of statistical fluctuations, the subtracted integrated net counts may not be 
monotonically increasing from E\o to -Eup- Therefore we force a monotonic behavior by setting IC{>E) > IC{>E') if 
IC{>E) < IC{>E') and E > £". Such a requirement can underestimate the quantiles overall, so we repeat the above 
using IC{<E) and force a monotonic decrease. Then, quantiles for the net distribution are given by 

N + IC{>E)-IC{<E) 
2N 
Ex% = E 
where N is the total net counts. If there is no statistical fluctuation, IC(>E) = N — IC{<E). For the error estimation, 
we need to know the energy of each source photon, which will be lost from background subtraction. So we generate a set 
of N energy values for source photons matching the above quantile relation and then apply the above error estimation 
technique using the Maritz-Jarrett method. 

7 
Originally M = [Nx + 0.5], where [z] is the integer part of z. 



